3 research outputs found

    Nn-X - a hardware accelerator for convolutional neural networks

    Get PDF
    Convolutional neural networks (ConvNets) are hierarchical models of the mammalian visual cortex. These models have been increasingly used in computer vision to perform object recognition and full scene understanding. ConvNets consist of multiple layers that contain groups of artificial neurons, which are mathematical approximations of biological neurons. A ConvNet can consist of millions of neurons and require billions of computations to produce one output. ^ Currently, giant server farms are used to process information in real time. These supercomputers require a large amount of power and a constant link to the end-user. Low powered embedded systems are not able to run convolutional neural networks in real time. Thus, using these systems on mobile platforms or on platforms where a connection to an off-site server is not guaranteed, is unfeasible. ^ In this work we present nn-X — a scalable hardware architecture capable of processing ConvNets in real time. We evaluate the performance and power consumption of the aforementioned architecture and compare it with systems typically used to process convolutional neural networks. Our system is prototyped on the Xilinx Zynq XC7Z045 device. On this device, we are able to achieve a peak performance of 227 GOPs/s, a measured performance of up to 200 GOPs/s while consuming less than 3 W of power. This translates to a performance per power improvement of up to 10 times that of conventional embedded systems and up to 25 times that of performance systems like desktops and GPUs

    Snowflake: A Model Agnostic Accelerator for Convolutional Neural Networks

    No full text
    Deep learning is becoming increasingly popular for a wide variety of applications including object detection, classification, semantic segmentation and natural language processing. Convolutional neural networks (CNNs) are a class of deep learning algorithms that have been shown to achieve high accuracy for these tasks. CNNs are hierarchical mathematical models comprising millions of operations to produce an output. This output can be used to identify what objects the input image contained, the locations of these objects and what actions to take based on this knowledge. The high computational complexity combined with the inherent parallelism in these models makes them an excellent target for custom accelerators. CNNs produce tens of megabytes of intermediate data and have highly varied data access patterns, both among different network architectures and across the hierarchy of a single network. However, when optimizing for different CNN hierarchies and data access patterns, it is difficult for custom accelerators to achieve close to 100% computational efficiency. We present a network architecture agnostic accelerator called Snowflake. Snowflake has been designed to achieve close to peak computational efficiency while processing all parts of the CNN hierarchy. We use a benchmark suite consisting of a variety of recent, deep CNNs with different model hierarchies to evaluate the performance of the accelerator. We demonstrate that Snowflake has an average computational efficiency of 93% with a minimum of 91% and a maximum of 95%
    corecore